Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells448434
Missing cells (%)8.4%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 86 (19.3%) missing values Age has 87 (19.5%) missing values Missing
Cabin has 362 (81.2%) missing values Cabin has 345 (77.4%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 297 (66.6%) zeros SibSp has 312 (70.0%) zeros Zeros
Parch has 339 (76.0%) zeros Parch has 344 (77.1%) zeros Zeros
Alert not present in this datasetSurvived is highly overall correlated with SexHigh Correlation
Alert not present in this datasetSex is highly overall correlated with SurvivedHigh Correlation
Alert not present in this datasetFare has 5 (1.1%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2023-10-10 11:13:35.4137662023-10-10 11:13:43.170087
Analysis finished2023-10-10 11:13:43.1676272023-10-10 11:13:49.849362
Duration7.75 seconds6.68 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean457.25112451.40135
 Dataset ADataset B
Minimum22
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:50.160288image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum22
5-th percentile5161.25
Q1224.25241.5
median467451
Q3679.75664.75
95-th percentile846.75843.5
Maximum891891
Range889889
Interquartile range (IQR)455.5423.25

Descriptive statistics

 Dataset ADataset B
Standard deviation256.16883252.32049
Coefficient of variation (CV)0.560236630.55897151
Kurtosis-1.2043126-1.185294
Mean457.25112451.40135
Median Absolute Deviation (MAD)223212.5
Skewness-0.064891381-0.00038955837
Sum203934201325
Variance65622.46763665.63
MonotonicityNot monotonicNot monotonic
2023-10-10T11:13:50.617998image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
362 1
 
0.2%
529 1
 
0.2%
173 1
 
0.2%
176 1
 
0.2%
521 1
 
0.2%
855 1
 
0.2%
608 1
 
0.2%
767 1
 
0.2%
123 1
 
0.2%
207 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
581 1
 
0.2%
244 1
 
0.2%
474 1
 
0.2%
261 1
 
0.2%
396 1
 
0.2%
130 1
 
0.2%
83 1
 
0.2%
836 1
 
0.2%
348 1
 
0.2%
683 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
16 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
6 1
0.2%
7 1
0.2%
10 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
19 1
0.2%
24 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
6 1
0.2%
7 1
0.2%
10 1
0.2%
13 1
0.2%
14 1
0.2%
15 1
0.2%
16 1
0.2%
19 1
0.2%
24 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
12 1
0.2%
16 1
0.2%
17 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
286 
1
160 
0
272 
1
174 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row01
3rd row00
4th row10
5th row00

Common Values

ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Length

2023-10-10T11:13:50.964195image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-10T11:13:51.206039image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:51.424851image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring characters

ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 286
64.1%
1 160
35.9%
ValueCountFrequency (%)
0 272
61.0%
1 174
39.0%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
263 
2
96 
1
87 
3
255 
1
101 
2
90 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row22
2nd row22
3rd row23
4th row31
5th row33

Common Values

ValueCountFrequency (%)
3 263
59.0%
2 96
 
21.5%
1 87
 
19.5%
ValueCountFrequency (%)
3 255
57.2%
1 101
 
22.6%
2 90
 
20.2%

Length

2023-10-10T11:13:51.663899image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-10T11:13:51.905496image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:52.218301image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
3 263
59.0%
2 96
 
21.5%
1 87
 
19.5%
ValueCountFrequency (%)
3 255
57.2%
1 101
 
22.6%
2 90
 
20.2%

Most occurring characters

ValueCountFrequency (%)
3 263
59.0%
2 96
 
21.5%
1 87
 
19.5%
ValueCountFrequency (%)
3 255
57.2%
1 101
 
22.6%
2 90
 
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 263
59.0%
2 96
 
21.5%
1 87
 
19.5%
ValueCountFrequency (%)
3 255
57.2%
1 101
 
22.6%
2 90
 
20.2%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 263
59.0%
2 96
 
21.5%
1 87
 
19.5%
ValueCountFrequency (%)
3 255
57.2%
1 101
 
22.6%
2 90
 
20.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 263
59.0%
2 96
 
21.5%
1 87
 
19.5%
ValueCountFrequency (%)
3 255
57.2%
1 101
 
22.6%
2 90
 
20.2%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:52.836491image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5051
Mean length26.83856526.699552
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1197011908
Distinct characters5958
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowdel Carlo, Mr. SebastianoChristy, Miss. Julie Rachel
2nd rowGivard, Mr. Hans KristensenBrown, Miss. Amelia "Mildred"
3rd rowHold, Mr. StephenMeek, Mrs. Thomas (Annie Louise Rowley)
4th rowGoldsmith, Mrs. Frank John (Emily Alice Brown)Newell, Mr. Arthur Webster
5th rowPanula, Mr. Jaako ArnoldFord, Mr. William Neal
ValueCountFrequency (%)
mr 269
 
14.8%
miss 84
 
4.6%
mrs 68
 
3.7%
william 27
 
1.5%
john 24
 
1.3%
master 19
 
1.0%
henry 17
 
0.9%
george 14
 
0.8%
james 13
 
0.7%
mary 13
 
0.7%
Other values (887) 1272
69.9%
ValueCountFrequency (%)
mr 259
 
14.3%
miss 92
 
5.1%
mrs 66
 
3.6%
william 29
 
1.6%
john 22
 
1.2%
master 18
 
1.0%
henry 14
 
0.8%
james 13
 
0.7%
thomas 13
 
0.7%
charles 10
 
0.6%
Other values (903) 1276
70.4%
2023-10-10T11:13:54.099134image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1376
 
11.5%
r 1007
 
8.4%
e 848
 
7.1%
a 811
 
6.8%
i 660
 
5.5%
n 641
 
5.4%
s 629
 
5.3%
M 553
 
4.6%
l 522
 
4.4%
o 504
 
4.2%
Other values (49) 4419
36.9%
ValueCountFrequency (%)
1368
 
11.5%
r 942
 
7.9%
a 852
 
7.2%
e 824
 
6.9%
s 662
 
5.6%
i 652
 
5.5%
n 638
 
5.4%
M 576
 
4.8%
l 511
 
4.3%
o 505
 
4.2%
Other values (48) 4378
36.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7648
63.9%
Uppercase Letter 1823
 
15.2%
Space Separator 1376
 
11.5%
Other Punctuation 962
 
8.0%
Open Punctuation 79
 
0.7%
Close Punctuation 79
 
0.7%
Dash Punctuation 3
 
< 0.1%
ValueCountFrequency (%)
Lowercase Letter 7608
63.9%
Uppercase Letter 1824
 
15.3%
Space Separator 1368
 
11.5%
Other Punctuation 954
 
8.0%
Open Punctuation 73
 
0.6%
Close Punctuation 73
 
0.6%
Dash Punctuation 8
 
0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1376
100.0%
ValueCountFrequency (%)
1368
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 1007
13.2%
e 848
11.1%
a 811
10.6%
i 660
8.6%
n 641
8.4%
s 629
8.2%
l 522
 
6.8%
o 504
 
6.6%
t 343
 
4.5%
h 256
 
3.3%
Other values (16) 1427
18.7%
ValueCountFrequency (%)
r 942
12.4%
a 852
11.2%
e 824
10.8%
s 662
8.7%
i 652
8.6%
n 638
8.4%
l 511
 
6.7%
o 505
 
6.6%
t 315
 
4.1%
h 252
 
3.3%
Other values (16) 1455
19.1%
Uppercase Letter
ValueCountFrequency (%)
M 553
30.3%
J 116
 
6.4%
A 113
 
6.2%
H 99
 
5.4%
C 90
 
4.9%
E 89
 
4.9%
S 87
 
4.8%
W 70
 
3.8%
B 69
 
3.8%
L 66
 
3.6%
Other values (15) 471
25.8%
ValueCountFrequency (%)
M 576
31.6%
A 115
 
6.3%
J 113
 
6.2%
H 97
 
5.3%
S 94
 
5.2%
C 91
 
5.0%
E 82
 
4.5%
B 77
 
4.2%
W 66
 
3.6%
L 61
 
3.3%
Other values (14) 452
24.8%
Other Punctuation
ValueCountFrequency (%)
, 446
46.4%
. 446
46.4%
" 66
 
6.9%
' 4
 
0.4%
ValueCountFrequency (%)
, 446
46.8%
. 446
46.8%
" 54
 
5.7%
' 8
 
0.8%
Open Punctuation
ValueCountFrequency (%)
( 79
100.0%
ValueCountFrequency (%)
( 73
100.0%
Close Punctuation
ValueCountFrequency (%)
) 79
100.0%
ValueCountFrequency (%)
) 73
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%
ValueCountFrequency (%)
- 8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9471
79.1%
Common 2499
 
20.9%
ValueCountFrequency (%)
Latin 9432
79.2%
Common 2476
 
20.8%

Most frequent character per script

Common
ValueCountFrequency (%)
1376
55.1%
, 446
 
17.8%
. 446
 
17.8%
( 79
 
3.2%
) 79
 
3.2%
" 66
 
2.6%
' 4
 
0.2%
- 3
 
0.1%
ValueCountFrequency (%)
1368
55.3%
, 446
 
18.0%
. 446
 
18.0%
( 73
 
2.9%
) 73
 
2.9%
" 54
 
2.2%
- 8
 
0.3%
' 8
 
0.3%
Latin
ValueCountFrequency (%)
r 1007
 
10.6%
e 848
 
9.0%
a 811
 
8.6%
i 660
 
7.0%
n 641
 
6.8%
s 629
 
6.6%
M 553
 
5.8%
l 522
 
5.5%
o 504
 
5.3%
t 343
 
3.6%
Other values (41) 2953
31.2%
ValueCountFrequency (%)
r 942
 
10.0%
a 852
 
9.0%
e 824
 
8.7%
s 662
 
7.0%
i 652
 
6.9%
n 638
 
6.8%
M 576
 
6.1%
l 511
 
5.4%
o 505
 
5.4%
t 315
 
3.3%
Other values (40) 2955
31.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11970
100.0%
ValueCountFrequency (%)
ASCII 11908
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1376
 
11.5%
r 1007
 
8.4%
e 848
 
7.1%
a 811
 
6.8%
i 660
 
5.5%
n 641
 
5.4%
s 629
 
5.3%
M 553
 
4.6%
l 522
 
4.4%
o 504
 
4.2%
Other values (49) 4419
36.9%
ValueCountFrequency (%)
1368
 
11.5%
r 942
 
7.9%
a 852
 
7.2%
e 824
 
6.9%
s 662
 
5.6%
i 652
 
5.5%
n 638
 
5.4%
M 576
 
4.8%
l 511
 
4.3%
o 505
 
4.2%
Other values (48) 4378
36.8%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
296 
female
150 
male
286 
female
160 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.67264574.7174888
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20842104
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalefemale
3rd rowmalefemale
4th rowfemalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 296
66.4%
female 150
33.6%
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%

Length

2023-10-10T11:13:54.475896image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-10T11:13:54.742485image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:54.958538image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
male 296
66.4%
female 150
33.6%
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%

Most occurring characters

ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2084
100.0%
ValueCountFrequency (%)
Lowercase Letter 2104
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 2084
100.0%
ValueCountFrequency (%)
Latin 2104
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2084
100.0%
ValueCountFrequency (%)
ASCII 2104
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 596
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 150
 
7.2%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7377
Distinct (%)20.3%21.4%
Missing8687
Missing (%)19.3%19.5%
Infinite00
Infinite (%)0.0%0.0%
Mean30.09583329.697772
 Dataset ADataset B
Minimum0.670.42
Maximum7171
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:55.310019image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.42
5-th percentile44.9
Q12120.75
median3028
Q33938
95-th percentile55.52556
Maximum7171
Range70.3370.58
Interquartile range (IQR)1817.25

Descriptive statistics

 Dataset ADataset B
Standard deviation14.08473514.367229
Coefficient of variation (CV)0.467996170.48378139
Kurtosis0.150194650.049728962
Mean30.09583329.697772
Median Absolute Deviation (MAD)99
Skewness0.292353080.39198147
Sum10834.510661.5
Variance198.37975206.41728
MonotonicityNot monotonicNot monotonic
2023-10-10T11:13:55.765030image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36 15
 
3.4%
19 14
 
3.1%
30 14
 
3.1%
18 14
 
3.1%
22 13
 
2.9%
31 12
 
2.7%
27 12
 
2.7%
32 12
 
2.7%
24 11
 
2.5%
40 10
 
2.2%
Other values (63) 233
52.2%
(Missing) 86
 
19.3%
ValueCountFrequency (%)
24 19
 
4.3%
21 16
 
3.6%
22 16
 
3.6%
18 14
 
3.1%
19 13
 
2.9%
36 13
 
2.9%
32 12
 
2.7%
30 12
 
2.7%
28 11
 
2.5%
29 10
 
2.2%
Other values (67) 223
50.0%
(Missing) 87
 
19.5%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
6 2
 
0.4%
7 1
 
0.2%
8 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 2
 
0.4%
4 6
1.3%
5 2
 
0.4%
6 1
 
0.2%
7 3
0.7%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 4
0.9%
3 2
 
0.4%
4 6
1.3%
5 2
 
0.4%
6 1
 
0.2%
7 3
0.7%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 6
1.3%
3 3
0.7%
4 5
1.1%
5 1
 
0.2%
6 2
 
0.4%
7 1
 
0.2%
8 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.558295960.50224215
 Dataset ADataset B
Minimum00
Maximum88
Zeros297312
Zeros (%)66.6%70.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:56.313268image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.17270871.1112274
Coefficient of variation (CV)2.10051432.2125332
Kurtosis18.27144119.597753
Mean0.558295960.50224215
Median Absolute Deviation (MAD)00
Skewness3.78520733.8980696
Sum249224
Variance1.37524561.2348264
MonotonicityNot monotonicNot monotonic
2023-10-10T11:13:56.587030image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 297
66.6%
1 111
 
24.9%
2 13
 
2.9%
3 10
 
2.2%
4 8
 
1.8%
8 5
 
1.1%
5 2
 
0.4%
ValueCountFrequency (%)
0 312
70.0%
1 98
 
22.0%
2 15
 
3.4%
4 9
 
2.0%
3 6
 
1.3%
8 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 297
66.6%
1 111
 
24.9%
2 13
 
2.9%
3 10
 
2.2%
4 8
 
1.8%
5 2
 
0.4%
8 5
 
1.1%
ValueCountFrequency (%)
0 312
70.0%
1 98
 
22.0%
2 15
 
3.4%
3 6
 
1.3%
4 9
 
2.0%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 312
70.0%
1 98
 
22.0%
2 15
 
3.4%
3 6
 
1.3%
4 9
 
2.0%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 297
66.6%
1 111
 
24.9%
2 13
 
2.9%
3 10
 
2.2%
4 8
 
1.8%
5 2
 
0.4%
8 5
 
1.1%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct76
Distinct (%)1.6%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.385650220.37668161
 Dataset ADataset B
Minimum00
Maximum65
Zeros339344
Zeros (%)76.0%77.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:56.845483image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum65
Range65
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.834196330.81380986
Coefficient of variation (CV)2.16309052.1604714
Kurtosis11.5414679.4840994
Mean0.385650220.37668161
Median Absolute Deviation (MAD)00
Skewness2.97920182.7521166
Sum172168
Variance0.695883510.66228649
MonotonicityNot monotonicNot monotonic
2023-10-10T11:13:57.121804image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 339
76.0%
1 62
 
13.9%
2 36
 
8.1%
5 3
 
0.7%
3 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 344
77.1%
1 52
 
11.7%
2 43
 
9.6%
5 4
 
0.9%
3 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 339
76.0%
1 62
 
13.9%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 344
77.1%
1 52
 
11.7%
2 43
 
9.6%
3 2
 
0.4%
4 1
 
0.2%
5 4
 
0.9%
ValueCountFrequency (%)
0 344
77.1%
1 52
 
11.7%
2 43
 
9.6%
3 2
 
0.4%
4 1
 
0.2%
5 4
 
0.9%
ValueCountFrequency (%)
0 339
76.0%
1 62
 
13.9%
2 36
 
8.1%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct377386
Distinct (%)84.5%86.5%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:58.048001image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.8968616.7152466
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30762995
Distinct characters3535
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique326341 ?
Unique (%)73.1%76.5%

Sample

 Dataset ADataset B
1st rowSC/PARIS 2167237789
2nd row250646248733
3rd row26707343095
4th row36329135273
5th row3101295W./C. 6608
ValueCountFrequency (%)
pc 23
 
4.0%
c.a 13
 
2.3%
a/5 10
 
1.8%
2 8
 
1.4%
ca 8
 
1.4%
ston/o 8
 
1.4%
1601 6
 
1.1%
soton/o.q 5
 
0.9%
a/4 5
 
0.9%
2343 5
 
0.9%
Other values (398) 479
84.0%
ValueCountFrequency (%)
pc 32
 
5.6%
c.a 11
 
1.9%
a/5 10
 
1.8%
ston/o 7
 
1.2%
2 7
 
1.2%
ca 6
 
1.1%
347082 5
 
0.9%
1601 5
 
0.9%
w./c 5
 
0.9%
soton/o.q 4
 
0.7%
Other values (410) 476
83.8%
2023-10-10T11:13:59.492953image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 392
12.7%
1 333
10.8%
2 313
10.2%
4 252
 
8.2%
7 235
 
7.6%
0 213
 
6.9%
6 201
 
6.5%
5 188
 
6.1%
9 163
 
5.3%
8 147
 
4.8%
Other values (25) 639
20.8%
ValueCountFrequency (%)
3 380
12.7%
1 317
10.6%
2 286
9.5%
7 250
8.3%
4 225
 
7.5%
6 225
 
7.5%
0 216
 
7.2%
5 208
 
6.9%
9 142
 
4.7%
8 141
 
4.7%
Other values (25) 605
20.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2437
79.2%
Uppercase Letter 344
 
11.2%
Other Punctuation 162
 
5.3%
Space Separator 124
 
4.0%
Lowercase Letter 9
 
0.3%
ValueCountFrequency (%)
Decimal Number 2390
79.8%
Uppercase Letter 323
 
10.8%
Other Punctuation 148
 
4.9%
Space Separator 122
 
4.1%
Lowercase Letter 12
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 392
16.1%
1 333
13.7%
2 313
12.8%
4 252
10.3%
7 235
9.6%
0 213
8.7%
6 201
8.2%
5 188
7.7%
9 163
6.7%
8 147
 
6.0%
ValueCountFrequency (%)
3 380
15.9%
1 317
13.3%
2 286
12.0%
7 250
10.5%
4 225
9.4%
6 225
9.4%
0 216
9.0%
5 208
8.7%
9 142
 
5.9%
8 141
 
5.9%
Space Separator
ValueCountFrequency (%)
124
100.0%
ValueCountFrequency (%)
122
100.0%
Other Punctuation
ValueCountFrequency (%)
. 106
65.4%
/ 56
34.6%
ValueCountFrequency (%)
. 98
66.2%
/ 50
33.8%
Uppercase Letter
ValueCountFrequency (%)
C 67
19.5%
O 64
18.6%
A 45
13.1%
S 42
12.2%
P 39
11.3%
N 26
 
7.6%
T 24
 
7.0%
Q 9
 
2.6%
W 8
 
2.3%
I 5
 
1.5%
Other values (6) 15
 
4.4%
ValueCountFrequency (%)
C 76
23.5%
P 49
15.2%
O 49
15.2%
A 40
12.4%
S 37
11.5%
N 20
 
6.2%
T 18
 
5.6%
W 8
 
2.5%
Q 7
 
2.2%
I 5
 
1.5%
Other values (6) 14
 
4.3%
Lowercase Letter
ValueCountFrequency (%)
a 3
33.3%
s 2
22.2%
l 1
 
11.1%
e 1
 
11.1%
r 1
 
11.1%
i 1
 
11.1%
ValueCountFrequency (%)
a 3
25.0%
s 3
25.0%
r 2
16.7%
i 2
16.7%
l 1
 
8.3%
e 1
 
8.3%

Most occurring scripts

ValueCountFrequency (%)
Common 2723
88.5%
Latin 353
 
11.5%
ValueCountFrequency (%)
Common 2660
88.8%
Latin 335
 
11.2%

Most frequent character per script

Common
ValueCountFrequency (%)
3 392
14.4%
1 333
12.2%
2 313
11.5%
4 252
9.3%
7 235
8.6%
0 213
7.8%
6 201
7.4%
5 188
6.9%
9 163
6.0%
8 147
 
5.4%
Other values (3) 286
10.5%
ValueCountFrequency (%)
3 380
14.3%
1 317
11.9%
2 286
10.8%
7 250
9.4%
4 225
8.5%
6 225
8.5%
0 216
8.1%
5 208
7.8%
9 142
 
5.3%
8 141
 
5.3%
Other values (3) 270
10.2%
Latin
ValueCountFrequency (%)
C 67
19.0%
O 64
18.1%
A 45
12.7%
S 42
11.9%
P 39
11.0%
N 26
 
7.4%
T 24
 
6.8%
Q 9
 
2.5%
W 8
 
2.3%
I 5
 
1.4%
Other values (12) 24
 
6.8%
ValueCountFrequency (%)
C 76
22.7%
P 49
14.6%
O 49
14.6%
A 40
11.9%
S 37
11.0%
N 20
 
6.0%
T 18
 
5.4%
W 8
 
2.4%
Q 7
 
2.1%
I 5
 
1.5%
Other values (12) 26
 
7.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3076
100.0%
ValueCountFrequency (%)
ASCII 2995
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 392
12.7%
1 333
10.8%
2 313
10.2%
4 252
 
8.2%
7 235
 
7.6%
0 213
 
6.9%
6 201
 
6.5%
5 188
 
6.1%
9 163
 
5.3%
8 147
 
4.8%
Other values (25) 639
20.8%
ValueCountFrequency (%)
3 380
12.7%
1 317
10.6%
2 286
9.5%
7 250
8.3%
4 225
 
7.5%
6 225
 
7.5%
0 216
 
7.2%
5 208
 
6.9%
9 142
 
4.7%
8 141
 
4.7%
Other values (25) 605
20.2%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct171180
Distinct (%)38.3%40.4%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.63126731.612088
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros45
Zeros (%)0.9%1.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:13:59.969474image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.157.225
Q17.9257.925
median14.454214.4542
Q329.12530.5
95-th percentile90106.425
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)21.222.575

Descriptive statistics

 Dataset ADataset B
Standard deviation48.12338653.362892
Coefficient of variation (CV)1.62407451.6880533
Kurtosis48.50118145.197189
Mean29.63126731.612088
Median Absolute Deviation (MAD)6.762556.7167
Skewness5.90020125.8543282
Sum13215.54514098.991
Variance2315.86032847.5982
MonotonicityNot monotonicNot monotonic
2023-10-10T11:14:00.443109image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.05 26
 
5.8%
26 22
 
4.9%
7.8958 18
 
4.0%
13 17
 
3.8%
7.75 17
 
3.8%
7.925 14
 
3.1%
10.5 13
 
2.9%
7.225 8
 
1.8%
26.55 8
 
1.8%
7.25 7
 
1.6%
Other values (161) 296
66.4%
ValueCountFrequency (%)
8.05 21
 
4.7%
13 20
 
4.5%
7.75 20
 
4.5%
26 17
 
3.8%
7.8958 15
 
3.4%
10.5 11
 
2.5%
7.2292 10
 
2.2%
7.775 9
 
2.0%
26.55 8
 
1.8%
7.8542 8
 
1.8%
Other values (170) 307
68.8%
ValueCountFrequency (%)
0 4
0.9%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
ValueCountFrequency (%)
0 5
1.1%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
ValueCountFrequency (%)
0 4
0.9%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct7288
Distinct (%)85.7%87.1%
Missing362345
Missing (%)81.2%77.4%
Memory size7.0 KiB7.0 KiB
2023-10-10T11:14:01.245044image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.47619053.3069307
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters292334
Distinct characters1818
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6178 ?
Unique (%)72.6%77.2%

Sample

 Dataset ADataset B
1st rowB80F33
2nd rowC92D48
3rd rowE121C90
4th rowC30B28
5th rowC123B51 B53 B55
ValueCountFrequency (%)
f2 3
 
3.1%
b51 2
 
2.1%
e33 2
 
2.1%
e121 2
 
2.1%
c25 2
 
2.1%
c23 2
 
2.1%
e8 2
 
2.1%
b55 2
 
2.1%
b53 2
 
2.1%
c27 2
 
2.1%
Other values (69) 76
78.4%
ValueCountFrequency (%)
g6 3
 
2.7%
f2 3
 
2.7%
d 3
 
2.7%
e33 2
 
1.8%
b28 2
 
1.8%
e101 2
 
1.8%
c65 2
 
1.8%
c125 2
 
1.8%
b35 2
 
1.8%
e44 2
 
1.8%
Other values (87) 87
79.1%
2023-10-10T11:14:02.344232image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 28
 
9.6%
2 27
 
9.2%
1 27
 
9.2%
3 26
 
8.9%
B 24
 
8.2%
6 16
 
5.5%
5 16
 
5.5%
8 16
 
5.5%
7 16
 
5.5%
0 15
 
5.1%
Other values (8) 81
27.7%
ValueCountFrequency (%)
1 34
 
10.2%
C 27
 
8.1%
5 25
 
7.5%
2 25
 
7.5%
B 24
 
7.2%
6 24
 
7.2%
4 23
 
6.9%
E 22
 
6.6%
3 22
 
6.6%
8 21
 
6.3%
Other values (8) 87
26.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 182
62.3%
Uppercase Letter 97
33.2%
Space Separator 13
 
4.5%
ValueCountFrequency (%)
Decimal Number 215
64.4%
Uppercase Letter 110
32.9%
Space Separator 9
 
2.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 28
28.9%
B 24
24.7%
E 15
15.5%
D 13
13.4%
F 8
 
8.2%
A 6
 
6.2%
G 3
 
3.1%
ValueCountFrequency (%)
C 27
24.5%
B 24
21.8%
E 22
20.0%
D 19
17.3%
A 9
 
8.2%
F 6
 
5.5%
G 3
 
2.7%
Decimal Number
ValueCountFrequency (%)
2 27
14.8%
1 27
14.8%
3 26
14.3%
6 16
8.8%
5 16
8.8%
8 16
8.8%
7 16
8.8%
0 15
8.2%
9 12
6.6%
4 11
6.0%
ValueCountFrequency (%)
1 34
15.8%
5 25
11.6%
2 25
11.6%
6 24
11.2%
4 23
10.7%
3 22
10.2%
8 21
9.8%
0 18
8.4%
7 12
 
5.6%
9 11
 
5.1%
Space Separator
ValueCountFrequency (%)
13
100.0%
ValueCountFrequency (%)
9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 195
66.8%
Latin 97
33.2%
ValueCountFrequency (%)
Common 224
67.1%
Latin 110
32.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 28
28.9%
B 24
24.7%
E 15
15.5%
D 13
13.4%
F 8
 
8.2%
A 6
 
6.2%
G 3
 
3.1%
ValueCountFrequency (%)
C 27
24.5%
B 24
21.8%
E 22
20.0%
D 19
17.3%
A 9
 
8.2%
F 6
 
5.5%
G 3
 
2.7%
Common
ValueCountFrequency (%)
2 27
13.8%
1 27
13.8%
3 26
13.3%
6 16
8.2%
5 16
8.2%
8 16
8.2%
7 16
8.2%
0 15
7.7%
13
6.7%
9 12
6.2%
ValueCountFrequency (%)
1 34
15.2%
5 25
11.2%
2 25
11.2%
6 24
10.7%
4 23
10.3%
3 22
9.8%
8 21
9.4%
0 18
8.0%
7 12
 
5.4%
9 11
 
4.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 292
100.0%
ValueCountFrequency (%)
ASCII 334
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C 28
 
9.6%
2 27
 
9.2%
1 27
 
9.2%
3 26
 
8.9%
B 24
 
8.2%
6 16
 
5.5%
5 16
 
5.5%
8 16
 
5.5%
7 16
 
5.5%
0 15
 
5.1%
Other values (8) 81
27.7%
ValueCountFrequency (%)
1 34
 
10.2%
C 27
 
8.1%
5 25
 
7.5%
2 25
 
7.5%
B 24
 
7.2%
6 24
 
7.2%
4 23
 
6.9%
E 22
 
6.6%
3 22
 
6.6%
8 21
 
6.3%
Other values (8) 87
26.0%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing02
Missing (%)0.0%0.4%
Memory size7.0 KiB7.0 KiB
S
331 
C
75 
Q
40 
S
303 
C
98 
Q
43 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446444
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowCS
2nd rowSS
3rd rowSS
4th rowSC
5th rowSS

Common Values

ValueCountFrequency (%)
S 331
74.2%
C 75
 
16.8%
Q 40
 
9.0%
ValueCountFrequency (%)
S 303
67.9%
C 98
 
22.0%
Q 43
 
9.6%
(Missing) 2
 
0.4%

Length

2023-10-10T11:14:02.680740image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-10-10T11:14:02.956471image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:14:03.203929image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
s 331
74.2%
c 75
 
16.8%
q 40
 
9.0%
ValueCountFrequency (%)
s 303
68.2%
c 98
 
22.1%
q 43
 
9.7%

Most occurring characters

ValueCountFrequency (%)
S 331
74.2%
C 75
 
16.8%
Q 40
 
9.0%
ValueCountFrequency (%)
S 303
68.2%
C 98
 
22.1%
Q 43
 
9.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 446
100.0%
ValueCountFrequency (%)
Uppercase Letter 444
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 331
74.2%
C 75
 
16.8%
Q 40
 
9.0%
ValueCountFrequency (%)
S 303
68.2%
C 98
 
22.1%
Q 43
 
9.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 446
100.0%
ValueCountFrequency (%)
Latin 444
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 331
74.2%
C 75
 
16.8%
Q 40
 
9.0%
ValueCountFrequency (%)
S 303
68.2%
C 98
 
22.1%
Q 43
 
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 444
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 331
74.2%
C 75
 
16.8%
Q 40
 
9.0%
ValueCountFrequency (%)
S 303
68.2%
C 98
 
22.1%
Q 43
 
9.7%

Interactions

Dataset A

2023-10-10T11:13:41.278142image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:47.783551image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:36.912881image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:43.708705image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:37.963617image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:44.702821image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:39.152588image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:45.743431image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:40.225582image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:46.788002image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:41.461954image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:48.148999image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:37.153309image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:43.893656image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:38.158409image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:44.916547image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:39.355689image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:45.943022image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:40.413028image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:46.973907image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:41.671184image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:48.346217image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:37.361158image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:44.097514image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:38.373276image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:45.141196image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:39.567904image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:46.148793image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:40.632284image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:47.178433image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:41.887568image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:48.560694image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:37.571494image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:44.315946image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:38.577973image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:45.344267image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:39.800416image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:46.374539image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:40.854779image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:47.400111image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:42.087364image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:48.765307image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:37.772547image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:44.502586image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:38.798098image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:45.546482image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:40.020074image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:46.578071image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-10-10T11:13:41.068167image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:13:47.591739image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

Dataset A

2023-10-10T11:14:03.393012image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-10-10T11:14:03.680591image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.049-0.0500.027-0.0290.1070.0000.0500.000
Age0.0491.000-0.170-0.1770.1390.1460.2520.0850.079
SibSp-0.050-0.1701.0000.4370.4770.1440.1540.2090.000
Parch0.027-0.1770.4371.0000.4010.1070.0000.2540.000
Fare-0.0290.1390.4770.4011.0000.3010.4460.1460.235
Survived0.1070.1460.1440.1070.3011.0000.3790.5000.038
Pclass0.0000.2520.1540.0000.4460.3791.0000.1470.261
Sex0.0500.0850.2090.2540.1460.5000.1471.0000.058
Embarked0.0000.0790.0000.0000.2350.0380.2610.0581.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.041-0.049-0.040-0.0100.0710.0000.0000.000
Age0.0411.000-0.184-0.2730.1440.1730.2650.1000.000
SibSp-0.049-0.1841.0000.4280.4280.1540.0960.1820.000
Parch-0.040-0.2730.4281.0000.4040.1680.0000.2000.029
Fare-0.0100.1440.4280.4041.0000.2680.4630.1500.240
Survived0.0710.1730.1540.1680.2681.0000.3170.5550.166
Pclass0.0000.2650.0960.0000.4630.3171.0000.1260.301
Sex0.0000.1000.1820.2000.1500.5550.1261.0000.099
Embarked0.0000.0000.0000.0290.2400.1660.3010.0991.000

Missing values

Dataset A

2023-10-10T11:13:42.373630image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-10-10T11:13:49.060132image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-10-10T11:13:42.793151image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-10-10T11:13:49.460493image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-10-10T11:13:43.059481image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-10-10T11:13:49.722807image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
36136202del Carlo, Mr. Sebastianomale29.010SC/PARIS 216727.7208NaNC
21321402Givard, Mr. Hans Kristensenmale30.00025064613.0000NaNS
23623702Hold, Mr. Stephenmale44.0102670726.0000NaNS
32832913Goldsmith, Mrs. Frank John (Emily Alice Brown)female31.01136329120.5250NaNS
68668703Panula, Mr. Jaako Arnoldmale14.041310129539.6875NaNS
11411503Attalah, Miss. Malakefemale17.000262714.4583NaNC
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
26026103Smith, Mr. ThomasmaleNaN003844617.7500NaNQ
80880902Meyer, Mr. Augustmale39.00024872313.0000NaNS
57057112Harris, Mr. Georgemale62.000S.W./PP 75210.5000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
58058112Christy, Miss. Julie Rachelfemale25.01123778930.0000NaNS
34534612Brown, Miss. Amelia "Mildred"female24.00024873313.0000F33S
41541603Meek, Mrs. Thomas (Annie Louise Rowley)femaleNaN003430958.0500NaNS
65966001Newell, Mr. Arthur Webstermale58.00235273113.2750D48C
868703Ford, Mr. William Nealmale16.013W./C. 660834.3750NaNS
35035103Odahl, Mr. Nils Martinmale23.00072679.2250NaNS
71071111Mayne, Mlle. Berthe Antonine ("Mrs de Villiers")female24.000PC 1748249.5042C90C
56656703Stoytcheff, Mr. Iliamale19.0003492057.8958NaNS
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
86386403Sage, Miss. Dorothy Edith "Dolly"femaleNaN82CA. 234369.5500NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
32532611Young, Miss. Marie Gricefemale36.000PC 17760135.6333C32C
20220303Johanson, Mr. Jakob Alfredmale34.00031012646.4958NaNS
39839902Pain, Dr. Alfredmale23.00024427810.5000NaNS
65165212Doling, Miss. Elsiefemale18.00123191923.0000NaNS
444513Devaney, Miss. Margaret Deliafemale19.0003309587.8792NaNQ
63863903Panula, Mrs. Juha (Maria Emilia Ojala)female41.005310129539.6875NaNS
39339411Newell, Miss. Marjoriefemale23.01035273113.2750D36C
31831911Wick, Miss. Mary Nataliefemale31.00236928164.8667C7S
10110203Petroff, Mr. Pastcho ("Pentcho")maleNaN003492157.8958NaNS
27327401Natsch, Mr. Charles Hmale37.001PC 1759629.7000C118C

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
55655711Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan")female48.0101175539.6000A16C
24624703Lindahl, Miss. Agda Thorilda Viktoriafemale25.0003470717.7750NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
73974003Nankoff, Mr. MinkomaleNaN003492187.8958NaNS
989912Doling, Mrs. John T (Ada Julia Bone)female34.00123191923.0000NaNS
55055111Thayer, Mr. John Borland Jrmale17.00217421110.8833C70C
47747803Braund, Mr. Lewis Richardmale29.01034607.0458NaNS
42742812Phillips, Miss. Kate Florence ("Mrs Kate Louise Phillips Marshall")female19.00025065526.0000NaNS
46246301Gee, Mr. Arthur Hmale47.00011132038.5000E63S
656613Moubarek, Master. GeriosmaleNaN11266115.2458NaNC

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.